A list of packages and libraries is imported:
import geopandas as gpd
import pandas as pd
import plotly.express as px
Spatial data file of urban population and total population are readed with geopandas using ".read_file" function.
gdf_urban_shp = gpd.read_file('urban_population.shp')
gdf_urban_shp.head()
| Country Na | Country Co | Indicator | Indicato_1 | 1960 | 1961 | 1962 | 1963 | 1964 | 1965 | ... | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | UPC_1990 | UPC_2000 | UPC_2010 | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Afghanistan | AFG | Urban population | SP.URB.TOTL | 755836.0 | 796272.0 | 839385.0 | 885228.0 | 934135.0 | 986074.0 | ... | 8204877.0 | 8535606.0 | 8852859.0 | 9164841.0 | 9477100.0 | 9797273.0 | 0.21177 | 0.22078 | 0.23737 | POLYGON ((7404817.438 4463862.784, 7466841.908... |
| 1 | Angola | AGO | Urban population | SP.URB.TOTL | 569222.0 | 597288.0 | 628381.0 | 660180.0 | 691532.0 | 721552.0 | ... | 16900847.0 | 17691524.0 | 18502165.0 | 19332881.0 | 20184707.0 | 21061025.0 | 0.37144 | 0.50087 | 0.59783 | MULTIPOLYGON (((1446654.358 -529289.854, 14061... |
| 2 | Albania | ALB | Urban population | SP.URB.TOTL | 493982.0 | 513592.0 | 530766.0 | 547928.0 | 565248.0 | 582374.0 | ... | 1630119.0 | 1654503.0 | 1680247.0 | 1706345.0 | 1728969.0 | 1747593.0 | 0.36428 | 0.41741 | 0.52163 | POLYGON ((2339940.185 4961221.199, 2337708.178... |
| 3 | United Arab Emirates | ARE | Urban population | SP.URB.TOTL | 67927.0 | 74975.0 | 84367.0 | 95215.0 | 106178.0 | 116473.0 | ... | 7866602.0 | 7935897.0 | 8047166.0 | 8182523.0 | 8332898.0 | 8479744.0 | 0.79051 | 0.80236 | 0.84087 | POLYGON ((5741805.754 2765811.385, 5761611.935... |
| 4 | Argentina | ARG | Urban population | SP.URB.TOTL | 15076842.0 | 15449950.0 | 15815502.0 | 16183085.0 | 16552517.0 | 16923103.0 | ... | 38990109.0 | 39467043.0 | 39940546.0 | 40410674.0 | 40877099.0 | 41339571.0 | 0.86984 | 0.89142 | 0.90849 | MULTIPOLYGON (((-7640303.070 -6882033.443, -75... |
5 rows × 68 columns
gdf_total_shp = gpd.read_file('total_population.shp')
gdf_total_shp.head()
| Country Na | Country Co | Indicator | Indicato_1 | 1960 | 1961 | 1962 | 1963 | 1964 | 1965 | ... | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | UPC_1990 | UPC_2000 | UPC_2010 | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Afghanistan | AFG | Population, total | SP.POP.TOTL | 8996973.0 | 9169410.0 | 9351441.0 | 9543205.0 | 9744781.0 | 9956320.0 | ... | 33370794.0 | 34413603.0 | 35383128.0 | 36296400.0 | 37172386.0 | 38041754.0 | 0.21177 | 0.22078 | 0.23737 | POLYGON ((7404817.438 4463862.784, 7466841.908... |
| 1 | Angola | AGO | Population, total | SP.POP.TOTL | 5454933.0 | 5531472.0 | 5608539.0 | 5679458.0 | 5735044.0 | 5770570.0 | ... | 26941779.0 | 27884381.0 | 28842484.0 | 29816748.0 | 30809762.0 | 31825295.0 | 0.37144 | 0.50087 | 0.59783 | MULTIPOLYGON (((1446654.358 -529289.854, 14061... |
| 2 | Albania | ALB | Population, total | SP.POP.TOTL | 1608800.0 | 1659800.0 | 1711319.0 | 1762621.0 | 1814135.0 | 1864791.0 | ... | 2889104.0 | 2880703.0 | 2876101.0 | 2873457.0 | 2866376.0 | 2854191.0 | 0.36428 | 0.41741 | 0.52163 | POLYGON ((2339940.185 4961221.199, 2337708.178... |
| 3 | United Arab Emirates | ARE | Population, total | SP.POP.TOTL | 92418.0 | 100796.0 | 112118.0 | 125130.0 | 138039.0 | 149857.0 | ... | 9214175.0 | 9262900.0 | 9360980.0 | 9487203.0 | 9630959.0 | 9770529.0 | 0.79051 | 0.80236 | 0.84087 | POLYGON ((5741805.754 2765811.385, 5761611.935... |
| 4 | Argentina | ARG | Population, total | SP.POP.TOTL | 20481779.0 | 20817266.0 | 21153052.0 | 21488912.0 | 21824425.0 | 22159650.0 | ... | 42669500.0 | 43131966.0 | 43590368.0 | 44044811.0 | 44494502.0 | 44938712.0 | 0.86984 | 0.89142 | 0.90849 | MULTIPOLYGON (((-7640303.070 -6882033.443, -75... |
5 rows × 68 columns
total_country_a = gdf_total_shp[gdf_total_shp['2010']>290000000]
total_country_a.head()
| Country Na | Country Co | Indicator | Indicato_1 | 1960 | 1961 | 1962 | 1963 | 1964 | 1965 | ... | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | UPC_1990 | UPC_2000 | UPC_2010 | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 28 | China | CHN | Population, total | SP.POP.TOTL | 667070000.0 | 660330000.0 | 665770000.0 | 682335000.0 | 698355000.0 | 715185000.0 | ... | 1.364270e+09 | 1.371220e+09 | 1.378665e+09 | 1.386395e+09 | 1.392730e+09 | 1.397715e+09 | 0.26442 | 0.35877 | 0.49226 | MULTIPOLYGON (((12186724.586 2047364.867, 1209... |
| 67 | India | IND | Population, total | SP.POP.TOTL | 450547679.0 | 459642165.0 | 469077190.0 | 478825608.0 | 488848135.0 | 499123324.0 | ... | 1.295604e+09 | 1.310152e+09 | 1.324510e+09 | 1.338659e+09 | 1.352617e+09 | 1.366418e+09 | 0.25547 | 0.27667 | 0.30930 | POLYGON ((10834404.758 3261766.224, 10842803.5... |
| 154 | United States | USA | Population, total | SP.POP.TOTL | 180671000.0 | 183691000.0 | 186538000.0 | 189242000.0 | 191889000.0 | 194303000.0 | ... | 3.183010e+08 | 3.206352e+08 | 3.229413e+08 | 3.249855e+08 | 3.266875e+08 | 3.282395e+08 | 0.75300 | 0.79057 | 0.80772 | MULTIPOLYGON (((-13674486.249 6242596.000, -13... |
3 rows × 68 columns
f = px.choropleth(total_country_a,
locationmode = 'country names',
locations = total_country_a['Country Na'],
color = total_country_a['UPC_2010'],
labels={'UPC_2010':'Urban Per Capita (2010)'},
projection="mercator")
f.update_layout(title_text = 'TOTAL POPULATION GREATER THAN "290000000" IN 2010',title_x=0.45,
width=900, height=700)
f.show()
For the year 2010, countries having population greater than "290000000" are China, India and United states. The urban popluation per capita of these countries are "0.49266", "0.3093" and "0.80772" respectively. We can say that in 2010, around 31% of the India population projected to lived in urban areas, half of the population of China lived in urban areas whereas around 80% of the people of United States lived in Urban Areas which tends to be a well developed country than India and China.
total_country_b = gdf_total_shp[gdf_total_shp['2010']<69000000]
total_country_b.head()
| Country Na | Country Co | Indicator | Indicato_1 | 1960 | 1961 | 1962 | 1963 | 1964 | 1965 | ... | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | UPC_1990 | UPC_2000 | UPC_2010 | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Afghanistan | AFG | Population, total | SP.POP.TOTL | 8996973.0 | 9169410.0 | 9351441.0 | 9543205.0 | 9744781.0 | 9956320.0 | ... | 33370794.0 | 34413603.0 | 35383128.0 | 36296400.0 | 37172386.0 | 38041754.0 | 0.21177 | 0.22078 | 0.23737 | POLYGON ((7404817.438 4463862.784, 7466841.908... |
| 1 | Angola | AGO | Population, total | SP.POP.TOTL | 5454933.0 | 5531472.0 | 5608539.0 | 5679458.0 | 5735044.0 | 5770570.0 | ... | 26941779.0 | 27884381.0 | 28842484.0 | 29816748.0 | 30809762.0 | 31825295.0 | 0.37144 | 0.50087 | 0.59783 | MULTIPOLYGON (((1446654.358 -529289.854, 14061... |
| 2 | Albania | ALB | Population, total | SP.POP.TOTL | 1608800.0 | 1659800.0 | 1711319.0 | 1762621.0 | 1814135.0 | 1864791.0 | ... | 2889104.0 | 2880703.0 | 2876101.0 | 2873457.0 | 2866376.0 | 2854191.0 | 0.36428 | 0.41741 | 0.52163 | POLYGON ((2339940.185 4961221.199, 2337708.178... |
| 3 | United Arab Emirates | ARE | Population, total | SP.POP.TOTL | 92418.0 | 100796.0 | 112118.0 | 125130.0 | 138039.0 | 149857.0 | ... | 9214175.0 | 9262900.0 | 9360980.0 | 9487203.0 | 9630959.0 | 9770529.0 | 0.79051 | 0.80236 | 0.84087 | POLYGON ((5741805.754 2765811.385, 5761611.935... |
| 4 | Argentina | ARG | Population, total | SP.POP.TOTL | 20481779.0 | 20817266.0 | 21153052.0 | 21488912.0 | 21824425.0 | 22159650.0 | ... | 42669500.0 | 43131966.0 | 43590368.0 | 44044811.0 | 44494502.0 | 44938712.0 | 0.86984 | 0.89142 | 0.90849 | MULTIPOLYGON (((-7640303.070 -6882033.443, -75... |
5 rows × 68 columns
total_country_b_max = total_country_b[total_country_b['UPC_2010']==total_country_b['UPC_2010'].max()]
total_country_b_max
| Country Na | Country Co | Indicator | Indicato_1 | 1960 | 1961 | 1962 | 1963 | 1964 | 1965 | ... | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | UPC_1990 | UPC_2000 | UPC_2010 | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 123 | Qatar | QAT | Population, total | SP.POP.TOTL | 47384.0 | 51421.0 | 56262.0 | 61716.0 | 67566.0 | 73633.0 | ... | 2459198.0 | 2565710.0 | 2654374.0 | 2724724.0 | 2781677.0 | 2832067.0 | 0.927859 | 0.96311 | 0.98501 | POLYGON ((5656155.380 2827764.198, 5648786.307... |
1 rows × 68 columns
total_country_b_min = total_country_b[total_country_b['UPC_2010']==total_country_b['UPC_2010'].min()]
total_country_b_min
| Country Na | Country Co | Indicator | Indicato_1 | 1960 | 1961 | 1962 | 1963 | 1964 | 1965 | ... | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | UPC_1990 | UPC_2000 | UPC_2010 | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 9 | Burundi | BDI | Population, total | SP.POP.TOTL | 2797932.0 | 2852438.0 | 2907321.0 | 2964427.0 | 3026290.0 | 3094379.0 | ... | 9844297.0 | 10160030.0 | 10487998.0 | 10827024.0 | 11175378.0 | 11530580.0 | 0.06271 | 0.08246 | 0.10642 | POLYGON ((3391868.555 -266990.291, 3398323.566... |
1 rows × 68 columns
f = px.choropleth(total_country_b,
locationmode = 'country names',
locations = total_country_b['Country Na'],
color = total_country_b['UPC_2010'],
labels={'UPC_2010':'Urban Per Capita (2010)'},
projection="mercator")
f.update_layout(title_text = 'Total POPULATION LESS THAN "69000000" IN 2010',title_x=0.45,
width=900, height=700)
f.show()
As we can see from the above choropleth map there are many countries having population less than "69000000" in the year 2010. The country having the highest urban per capita in 2010 was Qatar with 98.5%. As Qatar is not a big country but with 98.5% of urban population it can be called as developed and advanced country.
Many countries comes in the range of 80% to 95% of urban population per capita in 2010 such as Uruguay (94.4%), Iceland (93.5%), Argentina (90.8%), Australia (85.2%), New Zealand (86.16)%, Chile (87.07%), Sweden (85.05%), Finland (83.77%), Greenland (84.38%), Canada (80.9%). These countries can be called as less developed countries than Qatar in that year.
The country having the least urban population in 2010 was Burundi. Burundi is an small east-african country with only 10.6% of urban population in 2010 which means around 90% of the people lives in rural areas in Burundi.
urban_country = gdf_urban_shp[(gdf_urban_shp['2010']>110146163) & (gdf_urban_shp['2010']<223096279)]
urban_country.head()
| Country Na | Country Co | Indicator | Indicato_1 | 1960 | 1961 | 1962 | 1963 | 1964 | 1965 | ... | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | UPC_1990 | UPC_2000 | UPC_2010 | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 20 | Brazil | BRA | Urban population | SP.URB.TOTL | 33302773.0 | 35016991.0 | 36802627.0 | 38660045.0 | 40580114.0 | 42551349.0 | ... | 173346772.0 | 175375436.0 | 177386818.0 | 179379301.0 | 181335507.0 | 183241641.0 | 0.73922 | 0.81192 | 0.84335 | POLYGON ((-5941528.839 -3973993.738, -5972351.... |
| 66 | Indonesia | IDN | Urban population | SP.URB.TOTL | 12799371.0 | 13353483.0 | 13931417.0 | 14536390.0 | 15169460.0 | 15831166.0 | ... | 134287151.0 | 137751865.0 | 141210511.0 | 144652795.0 | 148084795.0 | 151509724.0 | 0.30584 | 0.42002 | 0.49914 | MULTIPOLYGON (((15696071.624 -287609.878, 1569... |
| 76 | Japan | JPN | Urban population | SP.URB.TOTL | 58526962.0 | 60965749.0 | 62428798.0 | 63957880.0 | 65516029.0 | 67107937.0 | ... | 116208079.0 | 116182717.0 | 116145370.0 | 116053379.0 | 115920900.0 | 115782416.0 | 0.77339 | 0.78649 | 0.90812 | MULTIPOLYGON (((15794521.520 4720612.890, 1569... |
3 rows × 68 columns
f = px.choropleth(urban_country,
locationmode = 'country names',
locations = urban_country['Country Na'],
color = urban_country['UPC_2010'],
labels={'UPC_2010':'Urban Per Capita (2010)'},
projection="mercator")
f.update_layout(title_text = 'URBAN POPULATION BETWEEN "110146163" AND "223096279" IN 2010',title_x=0.5,
width=900, height=700)
f.show()
For the year 2010, countries having urban population between "110146163" and "223096279" are Brazil, Indonesia and Japan. The urban popluation per capita of these countries are "0.84335", "0.49914" and "0.90812" respectively. We can say that in 2010, around 84% of the Brazil population projected to lived in urban areas, half of the population of Indonesia lived in urban areas whereas around 90% of the people of Japan lived in Urban Areas which tends to be a well developed country.
The ".max" fucntion of pandas is used to find the country which has highest population in 2010.
country = gdf_total_shp[gdf_total_shp['2010']==gdf_total_shp['2010'].max()]
country
| Country Na | Country Co | Indicator | Indicato_1 | 1960 | 1961 | 1962 | 1963 | 1964 | 1965 | ... | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | UPC_1990 | UPC_2000 | UPC_2010 | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 28 | China | CHN | Population, total | SP.POP.TOTL | 667070000.0 | 660330000.0 | 665770000.0 | 682335000.0 | 698355000.0 | 715185000.0 | ... | 1.364270e+09 | 1.371220e+09 | 1.378665e+09 | 1.386395e+09 | 1.392730e+09 | 1.397715e+09 | 0.26442 | 0.35877 | 0.49226 | MULTIPOLYGON (((12186724.586 2047364.867, 1209... |
1 rows × 68 columns
country['Percentage Change']=(country['UPC_2010'] - country['UPC_1990']) / country['UPC_1990']
country
C:\Users\rohit\Anaconda3\lib\site-packages\geopandas\geodataframe.py:1322: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
| Country Na | Country Co | Indicator | Indicato_1 | 1960 | 1961 | 1962 | 1963 | 1964 | 1965 | ... | 2015 | 2016 | 2017 | 2018 | 2019 | UPC_1990 | UPC_2000 | UPC_2010 | geometry | Percentage Change | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 28 | China | CHN | Population, total | SP.POP.TOTL | 667070000.0 | 660330000.0 | 665770000.0 | 682335000.0 | 698355000.0 | 715185000.0 | ... | 1.371220e+09 | 1.378665e+09 | 1.386395e+09 | 1.392730e+09 | 1.397715e+09 | 0.26442 | 0.35877 | 0.49226 | MULTIPOLYGON (((12186724.586 2047364.867, 1209... | 0.861659 |
1 rows × 69 columns
The percentage change in the urban population per capita from 1990 to 2010, for the country having the highest population in 2010 is saved in the dataframe with the column named as "Percentage Change".
f = px.choropleth(country,
locationmode = 'country names',
locations = country['Country Na'],
color = country['Percentage Change'],
labels={'Percentage Change'},
projection="mercator")
f.update_layout(title_text = 'PERCENTAGE CHANGE IN URBAN POPULATON PER CAPITA FROM 1990 TO 2010',title_x=0.5,
width=900, height=700)
f.show()
The country having the highest population in 2010 is "China". The urban population of China has grown rapidly from 26.4% in 1990 to 49.2% in 2010 with a percentage change of "86.16%". This shows that China has developed well in the 20 years from 1990 to 2010.
Pandas ".get_loc()" function return integer location, slice or boolean mask for requested label. The function works with both sorted as well as unsorted Indexes.
a = {gdf_urban_shp.columns.get_loc(c): c for idx, c in enumerate(gdf_urban_shp.columns)}
a
{0: 'Country Na',
1: 'Country Co',
2: 'Indicator',
3: 'Indicato_1',
4: '1960',
5: '1961',
6: '1962',
7: '1963',
8: '1964',
9: '1965',
10: '1966',
11: '1967',
12: '1968',
13: '1969',
14: '1970',
15: '1971',
16: '1972',
17: '1973',
18: '1974',
19: '1975',
20: '1976',
21: '1977',
22: '1978',
23: '1979',
24: '1980',
25: '1981',
26: '1982',
27: '1983',
28: '1984',
29: '1985',
30: '1986',
31: '1987',
32: '1988',
33: '1989',
34: '1990',
35: '1991',
36: '1992',
37: '1993',
38: '1994',
39: '1995',
40: '1996',
41: '1997',
42: '1998',
43: '1999',
44: '2000',
45: '2001',
46: '2002',
47: '2003',
48: '2004',
49: '2005',
50: '2006',
51: '2007',
52: '2008',
53: '2009',
54: '2010',
55: '2011',
56: '2012',
57: '2013',
58: '2014',
59: '2015',
60: '2016',
61: '2017',
62: '2018',
63: '2019',
64: 'UPC_1990',
65: 'UPC_2000',
66: 'UPC_2010',
67: 'geometry'}
The “iloc” function in pandas is used to select rows and columns by number, in the order that they appear in the data frame.
gdf_urban_shp ['mean_per_cap'] = ((gdf_urban_shp.iloc[:,34:55]) / (gdf_total_shp.iloc[:,34:55])).mean(axis=1)
gdf_urban_shp.head()
| Country Na | Country Co | Indicator | Indicato_1 | 1960 | 1961 | 1962 | 1963 | 1964 | 1965 | ... | 2015 | 2016 | 2017 | 2018 | 2019 | UPC_1990 | UPC_2000 | UPC_2010 | geometry | mean_per_cap | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Afghanistan | AFG | Urban population | SP.URB.TOTL | 755836.0 | 796272.0 | 839385.0 | 885228.0 | 934135.0 | 986074.0 | ... | 8535606.0 | 8852859.0 | 9164841.0 | 9477100.0 | 9797273.0 | 0.21177 | 0.22078 | 0.23737 | POLYGON ((7404817.438 4463862.784, 7466841.908... | 0.222128 |
| 1 | Angola | AGO | Urban population | SP.URB.TOTL | 569222.0 | 597288.0 | 628381.0 | 660180.0 | 691532.0 | 721552.0 | ... | 17691524.0 | 18502165.0 | 19332881.0 | 20184707.0 | 21061025.0 | 0.37144 | 0.50087 | 0.59783 | MULTIPOLYGON (((1446654.358 -529289.854, 14061... | 0.496477 |
| 2 | Albania | ALB | Urban population | SP.URB.TOTL | 493982.0 | 513592.0 | 530766.0 | 547928.0 | 565248.0 | 582374.0 | ... | 1654503.0 | 1680247.0 | 1706345.0 | 1728969.0 | 1747593.0 | 0.36428 | 0.41741 | 0.52163 | POLYGON ((2339940.185 4961221.199, 2337708.178... | 0.429190 |
| 3 | United Arab Emirates | ARE | Urban population | SP.URB.TOTL | 67927.0 | 74975.0 | 84367.0 | 95215.0 | 106178.0 | 116473.0 | ... | 7935897.0 | 8047166.0 | 8182523.0 | 8332898.0 | 8479744.0 | 0.79051 | 0.80236 | 0.84087 | POLYGON ((5741805.754 2765811.385, 5761611.935... | 0.806279 |
| 4 | Argentina | ARG | Urban population | SP.URB.TOTL | 15076842.0 | 15449950.0 | 15815502.0 | 16183085.0 | 16552517.0 | 16923103.0 | ... | 39467043.0 | 39940546.0 | 40410674.0 | 40877099.0 | 41339571.0 | 0.86984 | 0.89142 | 0.90849 | MULTIPOLYGON (((-7640303.070 -6882033.443, -75... | 0.890745 |
5 rows × 69 columns
The mean per capita world urban population (from 1990 to 2010) of all the countries is saved in the dataframe with the column named as "mean_per_cap".
f = px.choropleth(gdf_urban_shp,
locationmode = 'country names',
locations = gdf_urban_shp['Country Na'],
color = gdf_urban_shp['mean_per_cap'],
color_continuous_scale="Viridis",
labels={'mean_per_Cap' : 'Mean per capita'},
projection="mercator")
f.update_layout(title_text = 'Mean per capita from year 1990 to 2010',title_x=0.5,
width=900, height=700)
f.show()
The mean per capita from year 1990 to 2010 for every country is shown in the above choropleth map.
The ".iloc" of pandas is used to select specific columns of the dataframe.
gdf_total_shp ['Mean_WP'] = gdf_total_shp.iloc[:,34:55].mean(axis=1)
gdf_total_shp.head()
| Country Na | Country Co | Indicator | Indicato_1 | 1960 | 1961 | 1962 | 1963 | 1964 | 1965 | ... | 2015 | 2016 | 2017 | 2018 | 2019 | UPC_1990 | UPC_2000 | UPC_2010 | geometry | Mean_WP | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Afghanistan | AFG | Population, total | SP.POP.TOTL | 8996973.0 | 9169410.0 | 9351441.0 | 9543205.0 | 9744781.0 | 9956320.0 | ... | 34413603.0 | 35383128.0 | 36296400.0 | 37172386.0 | 38041754.0 | 0.21177 | 0.22078 | 0.23737 | POLYGON ((7404817.438 4463862.784, 7466841.908... | 2.129546e+07 |
| 1 | Angola | AGO | Population, total | SP.POP.TOTL | 5454933.0 | 5531472.0 | 5608539.0 | 5679458.0 | 5735044.0 | 5770570.0 | ... | 27884381.0 | 28842484.0 | 29816748.0 | 30809762.0 | 31825295.0 | 0.37144 | 0.50087 | 0.59783 | MULTIPOLYGON (((1446654.358 -529289.854, 14061... | 1.683680e+07 |
| 2 | Albania | ALB | Population, total | SP.POP.TOTL | 1608800.0 | 1659800.0 | 1711319.0 | 1762621.0 | 1814135.0 | 1864791.0 | ... | 2880703.0 | 2876101.0 | 2873457.0 | 2866376.0 | 2854191.0 | 0.36428 | 0.41741 | 0.52163 | POLYGON ((2339940.185 4961221.199, 2337708.178... | 3.095489e+06 |
| 3 | United Arab Emirates | ARE | Population, total | SP.POP.TOTL | 92418.0 | 100796.0 | 112118.0 | 125130.0 | 138039.0 | 149857.0 | ... | 9262900.0 | 9360980.0 | 9487203.0 | 9630959.0 | 9770529.0 | 0.79051 | 0.80236 | 0.84087 | POLYGON ((5741805.754 2765811.385, 5761611.935... | 3.857189e+06 |
| 4 | Argentina | ARG | Population, total | SP.POP.TOTL | 20481779.0 | 20817266.0 | 21153052.0 | 21488912.0 | 21824425.0 | 22159650.0 | ... | 43131966.0 | 43590368.0 | 44044811.0 | 44494502.0 | 44938712.0 | 0.86984 | 0.89142 | 0.90849 | MULTIPOLYGON (((-7640303.070 -6882033.443, -75... | 3.683281e+07 |
5 rows × 69 columns
The mean world population of the year between 1990 and 2010 is saved in the dataframe with the column named as "Mean_WP".
The ".concat" function is used to combine columns of two different dataframes.
data_concat = pd.concat([gdf_urban_shp['mean_per_cap'], gdf_total_shp['Mean_WP']], axis=1)
data_concat
| mean_per_cap | Mean_WP | |
|---|---|---|
| 0 | 0.222128 | 2.129546e+07 |
| 1 | 0.496477 | 1.683680e+07 |
| 2 | 0.429190 | 3.095489e+06 |
| 3 | 0.806279 | 3.857189e+06 |
| 4 | 0.890745 | 3.683281e+07 |
| ... | ... | ... |
| 158 | 0.216376 | 1.883747e+05 |
| 159 | 0.263220 | 1.745547e+07 |
| 160 | 0.570134 | 4.455899e+07 |
| 161 | 0.371357 | 1.054246e+07 |
| 162 | 0.327325 | 1.173322e+07 |
163 rows × 2 columns
Scatterplots are a fundamental graph type—much less complicated than histograms and boxplots.
import seaborn as sns
sns.scatterplot(x="mean_per_cap", y="Mean_WP", data=data_concat);
From the above plot we can conclude that the relationship between "mean world population" and "mean per capita world urban population" are weak as every point in the scatter plot is near 0.
To find the exact correlation in a single number we can use "correlation coefficient" with "correlation matrix" that describes the extent of the linear relationship between two variables.
A correlation matrix is a simple way to summarize the correlations between all variables in a dataset. As in our dataset, we have the following information of "Mean world population" and "Mean per capita urban population". It would be very difficult to understand the relationship between each variable by simple staring at the raw data. Fortunately, a correlation matrix can help us quickly understand the correlations between the pair of variables.
We want to understand the relationship between "Mean world population" and "Mean per capita urban population". One way to quantify this relationship is to use the Pearson correlation coefficient, which is a measure of the linear association between two variables. It has a value between -1 and 1 where:
"-1" indicates a perfectly negative linear correlation between two variables.
"0" indicates no linear correlation between two variables.
"1" indicates a perfectly positive linear correlation between two variables.
The further away the correlation coefficient is from zero, the stronger the relationship between the two variables.
Pandas DataFrame’s ".corr()" method is used to compute the correlation and seaborn’s "heatmap()" method is used to plot the matrix.
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize = (12,6))
ax=sns.heatmap(data_concat.corr(), annot=True)
Each cells in the above plot shows the correlation between the two variables. The black cells shows that the correlation between "Mean world population" and "Mean per capita" is -0.082, which indicates that they're weakly negatively correlated. Also notice that the correlation coefficients along the diagonal of the plot are equal to 1 because each variable is perfectly correlated with itself.